Improving web page clustering using Probabilistic Latent Semantic Analysis
نویسندگان
چکیده
Traditional clustering algorithms are usually based on the bag-of-words (BOW) approach. A notorious disadvantage of the BOW model is that it ignores the semantic relationship among words. As a result, if two documents use different collections of core words to represent the same topic, they may be assigned to different clusters, even though the core words they use are probably synonyms or semantically associated in other form and other disadvantage of conventional web page clustering technique is often utilized to reveal the functional similarity of web pages. Tagging can be beneficial to improve the clustering performance. Several efforts have been made to explore social tagging for clustering. But there is some drawbacks of tagging web based clustering. To our knowledge, all the existing approaches exploiting tag information for webpage clustering assume that all the WebPages are tagged, which is a somewhat restrictive assumption. In a more realistic setting, one can only expect that the tags
منابع مشابه
Co-clustering for Weblogs in Semantic Space
Web clustering is an approach for aggregating web objects into various groups according to underlying relationships among them. Finding co-clusters of web objects in semantic space is an interesting topic in the context of web usage mining, which is able to capture the underlying user navigational interest and content preference simultaneously. In this paper we will present a novel web co-clust...
متن کاملPersonal Name Resolution of Web People Search
Disambiguating personal names in a set of documents (such as a set of web pages returned in response to a person name) is a difficult and challenging task. In this paper, we explore the extent to which the “cluster hypothesis” for this task holds (i.e., that similar documents tend to represent the same person). We explore two clustering techniques which used either (1) term based matching (sing...
متن کاملResolving Person Names in Web People Search
Disambiguating person names in a set of documents (such as a set of web pages returned in response to a person name) is a key task for the presentation of results and the automatic profiling of experts. With largely unstructured documents and an unknown number of people with the same name the problem presents many difficulties and challenges. This chapter treats the task of person name disambig...
متن کاملA Web Recommendation Technique Based on Probabilistic Latent Semantic Analysis
Web transaction data between Web visitors and Web functionalities usually convey user task-oriented behavior pattern. Mining such type of clickstream data will lead to capture usage pattern information. Nowadays Web usage mining technique has become one of most widely used methods for Web recommendation, which customizes Web content to user-preferred style. Traditional techniques of Web usage m...
متن کاملDiscovering User Access Pattern Based on Probabilistic Latent Factor Model
There has been an increased demand for characterizing user access patterns using web mining techniques since the informative knowledge extracted from web server log files can not only offer benefits for web site structure improvement but also for better understanding of user navigational behavior. In this paper, we present a web usage mining method, which utilize web user usage and page linkage...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012